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Abstract —An outage detection framework for power distribution 
networks is proposed. Given the tree structure of the distribution 
system, a method is developed combining the use of real-time power 
flow measurements on edges of the tree with load forecasts at 
the nodes of the tree. A maximum a posteriori detector (MAP) 
is formulated for arbitrary number and location of outages on trees 
which is shown to have an efficient detector. A framework relying 
on the maximum missed detection probability is used for optimal 
sensor placement and is solved for tree networks. Finally, a set of case 
studies is considered using feeder data from the Pacific Northwest 
National Laboratories. We show that a 10% loss in mean detection 
reliability network wide reduces the required sensor density by 60 
% for a typical feeder if efficient use of measurements is performed. 


I. Introduction 

Outage detection and management has been a long-standing 
problem in power distribution networks. Outages are caused by 
protective devices closing off a part of the network to automati¬ 
cally isolate faults. Usually, a short circuit fault will trigger this 
protective operation. We employ the term outage detection to 
denote the task of finding the status of the protective devices, and 
the term fault detection to denote finding the faults that caused 
the resulting outage situation. 

Many methods for outage and fault detection based on ar¬ 
tificial intelligence have been developed. Outage detection is 
often performed prior to fault detection and can greatly improve 
the accuracy of fault diagnosis. For outage detection, fuzzy set 
approaches have been proposed based on customer calls and 
human inspection m, and based on real-time measurement 
with a single sensor at the substation m. In networks where 
supervisory control and data acquisition (SCADA) systems are 
available, a subset of the protective devices’ status can be obtained 
via direct monitoring. When two-way communications from the 
operator and the smart meters are available, AMI polling has 
been proposed to enhance outage detection 0. There have also 
been knowledge based systems that combine different kinds of 
information (customer calls, SCADA, AMI polling) 0. For fault 
detection, using only a single digital transient recording device 
at the substation, fault location and diagnosis systems have been 
developed based on fault distance computation using impedance 
information in the distribution system m. Using only the outage 
detection results, i.e., the status of the protective relays, expert 
systems have been applied to locate the underlying faults 0. 
Incorporating voltage measurements in the distribution system 
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with the outage detection results, fault detection methods based 
on knowledge based systems have been proposed 13. Fault 
detection that uses fault voltage-sag measurements and matching 
has been proposed in 0, 0. Fault diagnosis based on fuzzy 
systems and neural networks have also been proposed that can 
resolve multiple fault detection decisions 03- Existing outage 
and fault detection methods based on artificial intelligence do 
not provide an analytical performance metric, so it is in general 
hard to examine their optimality. Their performance can however 
be evaluated numerically and in simulation studies. Moreover, 
because of this lack of an analytical metric, while some of 
the existing approaches depend on near real-time sensing (e.g. 
SCADA), they do not provide guidance on where to deploy the 
limited sensing resources within the distribution system. 

A major alternative to these mechanism is the so called last 
gasp, where area’s in outage will notify, via distress signal that 
they are out of power. These provide a duplicate method of outage 
detection which can be combined with the proposed methods here. 
In fact, combining both of these methods can further reduce the 
time to outage in practical scenarios. 

The proposed sensing and feedback framework exploits the 
combination of real-time sensing and feedback from a limited 
number of power flow sensors and the infrequent load updates 
from AMI or forecasting mechanisms. This can is practically 
possible since there is a growing number of deployments of 
distribution system line measurements 0, which can measure 
line current with high precision. 

II. Problem Formulation and Main Contributions 

Consider a power distribution network that has a tree structure. 
Power is supplied from the feeder at the root, and is drawn by all 
the downstream loads. An outage is a protective device isolating 
a faulted area. When this occurs, the loads downstream of the 
faulted area will be in outage. We investigate the optimal design 
and performance of automatic outage detection systems with the 
use of the following two types of measurements: 

Noisy Nodal Consumption typically in the form of forecasts 
which have forecast errors that must be taken into account. 
Error Free Edge Flows which typically come from real¬ 
time SCADA measurements of the power flows on a fraction 
of the lines. 

The issue of noisy and error free measurement comes from the 
fact that loads come from delayed information which needs to be 
forecasted, while SCADA systems have real time communication 
potential. This work assumes lossless power flow, but can also be 
applied in the case of current measurements on the line and load 
level. This can be done, since practical distribution line sensing 
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is accurate in terms of current measurements, and smart meter 
interval data provides power and voltage information, making 
current inference possible. 

The main contributions of this work are the following: 

Outage Detection We formulate the problem of detecting 
any number of possible outages via nodal and edge mea¬ 
surements as a general hypothesis testing problem where the 
number and locations of outages are unknown. We show that 
this general formulation results in a computationally efficient 
decentralized hypothesis detector. 

Sensor Placement We use the decentralized nature of the 
detector provide a optimal sensor placement with respect to 
the maximum missed detection error for all hypotheses. 

III. System Model and Notation 



Fig. 1. Example tree 7i used to illustrate various properties. Each node in the 
network is numbered. Node v n is connected to it’s parent via edge e n consuming 
x(v) power at each node. Two flow measurement sensors .so, .s ] along with load 
pseudo measurements x(v). 

Topology of the Distribution System : The vertices in the distri¬ 
bution network are indexed by V = {uo, i>i,..., vn}, with bus 
vq denoting the root of the tree. We index by e n the line that 
connects bus v n and its parent node. 

Outage Hypothesis Model: Outages are modeled as disconnected 
edges corresponding to protective devices disconnecting loads on 
a network. For example, consider single line outages in a tree 
with N edges: In this situation, there will exist N single edge 
outage hypotheses and a single non-outage situation. Let TL 1 = 
{ei,..., ejv U 0} be the set of all single outage hypotheses for a 
tree T. 

We consider a more general case of an unknown number and 
location of potential outages. We define the set of up to k edge 
outages TL k as the set of k edge hypothesis. This set follows: 

H k = H 1 xH 1 ( 1 ) 

k times 

Load Model: Each node v in the graph has a consumption load 
x(v). The forecast of each load is x(v) with error e(v) = 
x{v ) — x(v). We assume errors are mutually independent random 
variables that follow e(v) ~ N( 0, cr(v) 2 ). Given the forecasts we 
treat the true load, which is unknown to us, as a random variable 
x(v) ~ N(x(v),a 2 (v)). In the vector case, we have 

x~JV(x, £) (2) 

where we can assume £ is a diagonal covariance matrix. 


Measurement Model: For any edge e, denote by s the power flow 
on it towards all active downstream loads. The measured flow 
depends on the network topology, outage situation and the true 
loads. The sensor placement is denoted as M with A4 C E. The 
vector of all measurements is s £ R' M '. 

Given a tree T assume hypothesis H corresponds to the outage 
of any number of disconnected edges. The measured power 
consumption of the i th sensor measurement under hypothesis 
H £ TL k is 

s i{H ) = ^2 x(v), ( 3 ) 

v£Vi(H) 

where the set V, (H) indicate the set of vertices to be summed 
over under any particular hypothesis. 

A general representation of the observed flow is the following. 
The set of full flow observations s, given a particular hypothesis 
H £ H k , we can represent the observations as: 

s = T h x WH £ H k , (4) 

where Th £ {0,1}I' M I X I V 1. Here Fh is generated for each 
hypothesis and we assume the forecast error covariance £ can 
be estimated from the load forecasting process. 

IV. General Outage Detection 

Consider the general outage detector. Given the vector of load 
forecasts, x, nominal forecast error £ and real time load flows s 
along a set of branches, the detector must determine the correct 
number and location of each edge in outage H £ TL k . 

These are single snapshot values of load forecast and line flow. 
A multi period detection framework can be analyzed in a similar 
fashion. We first construct a simple but naive multiple hypothesis 
detector relying on a maximum likelihood estimator. Consider the 
flow model in eq. 0. relating the true load at each node to the 
observed flow on the network. 

Given the forecast model in eq. |2]) and the hypothesis model 
in eq. 0 . the Maximum a Posteriori detector is the following: 

{k *, H} £ argmaxPr(s | x, H) (5) 

HGH k 

See Appendix [B] for details. 

The flow likelihood can be computed as follows: 

s|{x, H} = F h tc 

= F H (Z + e) 

~ N (r ff x, r H £r£) ViT g H k (6) 

Eq. 0 allows us to evaluate a likelihood under each possible 
hypothesis. Therefore, a naive detector will enumerate every pos¬ 
sible outage, evaluate it’s likelihood, and choose the maximum. 
This is difficult for the following reasons: 

1) The set H k is of size (^), so computing the set H k 
can be very expensive. Enumerating the entire maximum 
likelihood detector requires J2k —l (k) evaluations. 

2) Many of the potential hypotheses map to the same observed 
flows, therefore the detector output is not unique. This 
occurs when one edge is a descendant of another. 

3 ) Missed detection errors in a multivariate hypothesis testing 
framework can only be evaluated via monte carlo testing. 
There is no insight in optimizing placement. 
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V. Decoupled Maximum Likelihood Detection 

We show that the issues regarding the general maximum likeli¬ 
hood detector can be overcome by decoupling the hypotheses and 
the observations, given the tree structure of the outage detection 
problem. This leads to a simple decentralized detector which is 
equivalent to eq. 0 . where each decision is a scalar hypothesis 
test. This leads to an efficient hypothesis enumeration, detection 
and error evaluation. 

In the following sections, we show the following: 

1) The original search space H k can be replaced by a set H u 
of uniquely detected outages, due to the tree structure of 
the network. 

2) Processing zero/positive flow information reduces the 

search space from H u to which decouples into a 
product set of local hypotheses: H+ = /\ 

AgA+ 

Where, A indicates a local area with it’s own hypothesis 
set. 

3) The joint likelihood function Pr(s | x, II) decouples along 
each area. 

Combining these results leads to a decentralized detector which 
can be solved easily. 


A. Unique Outages 

Maximizing the likelihood of observations over the k outage 
set can lead to a non-unique solution. An alternative is to only 
consider uniquely detectable outages, where no possible outage 
event is downstream of any other. 

Define the set of outage hypotheses H u (u is for unique) as 
follows: 




Fig. 2. |2(a)| B lanches of tree Tj. |2(b)| Branch network for tree 7f. |2(c)| Simple 

tree network 75. |2(d)| Branch graph for 75■ 


it is clear that H u = H u [b i), since this is the root branch of the 
tree. 


H u (b) = E(b)ul U (/\U u (b) 

\beP(child(E>)) VbGb 


( 9 ) 


Here b is the current node, and child(6) is the set of children of b 
in the branch tree. The set P(child(6)) is the power set of all the 
child branches, where any element of the power set is b. Note that 
if a branch has no descendants, we merely evaluate £( 6 ), since 
the remaining terms are null. Eq. {9} is quite unwieldy, but can 
be interpreted easily. For tree T 2 in Figure [2(c)| the child branches 
are child(6i) = { 62 , 63 } while the power set is: 


P({6 2 ,6 3 }) = {{ 0 },{ 6 2 }, {63}, {6 2 ,6 3 }j. 


( 10 ) 


Evaluating {9]i, we arrive at the following: 


U u (b 1 ) = E(b 1 )U 


A Kuib) 

ihe{0} 


U 


A Kuib) 

i fee {62} 


H u ={H £ H k for some k, s.t no two edges 

are descendant of each other.} (7) 

This definition is not constructive, but useful. Consider tree T 2 
shown in Figure |2(c)| Here H u can be enumerated by simple 
observation. 


Hu = {0, e i; e 2 , e 3 , e 4 , e 5 , (e 3 x e 5 ), (e 4 x e 5 )} (8) 


There is a single non-outage hypothesis 0, and 5 single outage 
hypotheses, and 2 double outage hypotheses. 

For a more general case (Figure |T]), enumerating this set for a 
tree can be performed recursively. The set H u can be enumerated 
using a ’branch-network’ based on the original tree. The tree 
in Figure [T] is depicted in Figure |2(a) with nodes and edges 
removed which highlights the various branches of the graph. Each 
set of branches are aggregated as a node to be traversed, in the 
hypothesis enumeration procedure. 

Consider a set function E(b) = {e £ E : e is along branch 6 ^} 
to enumerate the set of edges on a branch. Given the two 
examples, we have the following branch-edges: 


• 7 }: E(b\) = {ei,e2,e 3 }, £(6 2 ) = {e 4 , e.5, ee, ej}, E{b 3 ) = 

{ e 8i e 9 > e i(b e iii ei 2 }, £’(6 4 ) = {ei 3 , ei 4 , eis} and £(65) = 

{ei6, ei7, eis}- 

• 72: £(61) = {ei, e 2 }, £(6 2 ) = {e 3 , e 4 }, £(63) = {es}. 
Using this definition, we propose the following recursive defi¬ 
nition of the set H u {b), which indicates the set of hypotheses 
formed from branch 6 and all descendants. From this definition, 


u a u A ( J1 ) 

\be{fe 3 } ) \be{b 2 ,b3} J 

We rely on the following definitions: 

D1 Base case: f\ H u (b) = 0 . 
be{0} 

D2 Hypotheses double counting: e, IJ e, = e,, including the 
empty set 0 U 0 = 0 . 

D3 Cross product reduction: 0 x e* = e,, which implies £(6) x 
0 = £( 6 ). 

For example, eq. 0 is applied to T 2 as follows: For 61 , we have 
that 

H u {bi) = {£(61) U ’H M (6 2 ) U 'H u (6 3 ) U H u (b 2 ) x 7^(63)}. 

( 12 ) 

For the branches 6 2 and 63 we use eq. (J9]l and D1 resulting in 
H u (b 2 ) = {£(6 2 ) U 0 } and H u (b 3 ) = {£(63) U 0 }. Using D2 
and D3, we have the following: This results in: 

H u ( 61) = {£(60 U { 0 } U {£(62) U 0 } U {£(63) U 0 } 

U {£(62) U 0 } x {£(63) U 0 }} 

= {0 U £(61) U {£(62) U £(6 2 ) U £(63) U (£(6 2 ) x £(63))} 
= {0, ei, e 2 , e 3 , e 4 , e 5 , (e 3 x e 5 ), (e 4 x e 5 )} 

Which is identical to simple enumeration. At this point, we have 
reduced the maximum likelihood detector to the following form: 

H = argmaxPr(s | x, H) (13) 

-HG«„ 
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Note the equality in the maximization since the restriction of 
the search space allows us to find a unique solution. Notice that 
this procedure will automatically enumerate all possible number 
of outages as well as their positions in the graph. Under the 
general model of any number of edge outages, this is the complete 
enumeration of the hypothesis set which has a high computational 
burden. 

In the following sections we show that (1) the search space de¬ 
couples to a product space of local hypotheses given binary flow 
indicators and ( 2 ) the likelihood function decouples along these 
local ’areas’, thereby reducing to a decoupled scalar hypothesis 
test for each local area. 


B. Reducing Hypotheses from binary flow information 




Fig. 3. |3(a)| The branch network for network Ti wi th associated sensors si, 

S 2 and area hypotheses 'H f [ A \) and "Hj (/to). |3(b)| Each area will keep the 
local branches. Branch 63 is split between two areas and processed as different 
vertices in branch-network (b 3 and b 3 . |3(c)| The network can be modeled by a 
directed graphical model indicating observations (shaded) and variables that must 
be maximized over. 


The detection problem will encounter a set of positive flows as 
well as zero’s when a sensor is downstream of an edge outage. 
The ML detection observations and search space can be reduced 
as: 


argmaxPr({s z s + } x, H) 

(14) 

VHeHu 


argmaxPr ({s + } x, H) 

Vi/e«+ 

(15) 


By processing the flow information we reduce the set of hy¬ 
pothesis in the detector from the set 7~L U to the reduced set 
Hf. More importantly, we show that Hf does not require a 
recursive enumeration, but actually decouples as a product space 
of local hypotheses. This is first shown in an example, then the 
general form is stated. Consider 71 with branch network with flow 


measurements in Figure 3(b) In this example, all branches are 
unchanged except 63, which is split into b 3 and b 3 and is separated 
by flow measurement sy. In the case of splitting branch nodes, 
the upper branch will take the edge with the measurements so 
E(b 3 ) = {e 8 ,e 9 ,ei 0 } and E(b l 3 ) = {en,ei 2 }. 

The following illustrative example highlights a general decou¬ 
pling principle of the hypothesis set Hf. Let’s consider the two 
cases separately (si,s 2 > 0 and Si > 0,s 2 = 0) which provides 
the intuition for the general case. 

1) Case 1: (si > 0, s 2 > 0) This implies that there cannot 
be any outage with edges in b\ or b 3 , else s 2 = 0. A brute force 
enumeration of 'Hj(hi) is done by enumerating T~L(bi) according 
to Figure 2(b)| then removing any terms with E(b\ ) and E(b 3 ) 


outages. It can be shown that: 

U u (h) = {0 U E(h) U E(b 2 ) U (E(b%) U U u (b l 3 )) 

U E(b 2 ) x (E(b%)UH u (b l 3 ))} 

Removing the possible outages due to the positive flow informa¬ 
tion (i.e. any tuple with edges in E{b 3 ), or E(bi)) , we have: 

Kih) = {0 U E(b 2 ) U n+(b l 3 ) U (Eib 2 ) x n+(b l 3 ))} 

= { 0 ! U E(b 2 )} x { 0 2 U TL+ (63)} 

= H+(A 1 )x'H+(A 2 ) 

We use A\ and A 2 to define a local area. An area is a partition 
of the original tree, which will decouple the set of all hypotheses 
into a product space of ’area hypotheses’. Each area will contain 
a root sensor (ex. Ai contains Si) and a set of descendent sensors 
(ex. child(si) = {s 2 }). 

Within a local area, a set of unique outage hypotheses are eval¬ 
uated T-Lf(Ai) which satisfy the binary observations of whether 
any flow is observed along the downstream sensors child(si) > 0. 
The local hypotheses of each area are later combined to form 
any possible hypothesis from the original enumeration of unique 
outages. Appendix [C] contains a more detailed discussion. 

2) Case 2: (si > 0, s 2 = 0) This implies that all possible 
outages must contain b\ or b 3 , else s 2 > 0. First we enumerate 
T~L u {bi) then keep only those elements which lead to s 2 = 0 (i.e. 
every tuple with edges in E(b 3 ), or E{b\)) , we have: 

n+(h) = {0 U E(h) U E{bl) U E(b 2 ) x E(b u 3 )} 

= {K(At)} 

Note that in this case, Hf(A 2 ) is never evaluated since s 2 = 0. 
Different positive and zero flow patterns lead to changes in the 
local hypothesis set TLf(A). 

3) General Hypothesis Decoupling: A complete description of 
the set should be 7~Lf(A, /), where / = I{chiid(si)>o}- since this 
set depends on the binary flow information of the child sensors. 
For example Case 2 is Tiff — 'Hf(Ai,{ 1 0}) and Case 2 is 

= Hf{Ai,{l 1}) x H+(A 2 ,{1 1}). For any arbitrary tree 
and flow sensors, the unique hypothesis set conditional on binary 
flows 'Hf will decouple according to: 

K = A n u(A). ( 16 ) 

AgA+ 

Where the set A + indicates the areas which have a root mea¬ 
surements Si > 0. Finally, 'Hf(A) is the local conditional 
hypothesis, which can be computed in the general case. Any 
given Hi £ 'Hf can be represented by a product of area 
hypotheses Hi = H 3 ^ x ... x H M i ^ M \, where for the k th area 
Hk,i(k) £ Index i(k) is the particular index into the k th 

hypothesis set corresponding to global hypothesis Hi. Appendix 
|d| presents a general algorithm generate the set 'Hf (A, /), for 
arbitrary binary flow information. 

C. Decoupling the Joint Likelihood: 

Due to the model of noiseless flow and forecasted nodal 
measurements joint likelihood of all observations, given x, and 
H £ decouple across the areas. In general we have: 

Pr(s + x, H) = Pr(si | child(si), x, Hf) ( 17 ) 

■i:AieA+ 
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The tree 7i is presented here as an example, while the general 
case can be easily shown as an extension of the example. 

Consider again the sub-graph in Figure [3(a)] where we have al¬ 
ready reduced the hypotheses using binary flow information (both 
So, si > 0). As discussed in Section V-B3| a unique hypothesis 
Hi £ Hu can be represented as Hi = H t ,(n x H 2i ( 2 y Where 
£ H{A 1 ) and f? 2 ,i( 2 ) € H(A 2 ). 

From eq. Q the observed flow si, can be computed as follows: 


si|{x, Hi} = ^2 x{v) (18) 

V(zV (#l,i(l) x Hi,i( 2 )) 

= ^0) + H *(«) 

v&V(H 1 M 1 ) xH 2 til )\V(H 2ttl ) vdV(H 2A(2) ) 

(19) 

= ^2 x ( v ) + s 2 (20) 
vGV(H liW )\V(H 2 t n,) 


Where: 

. Eq. ( p~8l > is the summation of true loads. By decoupling of 
the hypotheses across areas, we represent Hi as the product 
of the two local hypotheses Hi^u and fT 2 ,i( 2 )- 

. In (QPJ, this can be separated as the sum of parts 

1) V(H lt x H 2 f) \ V(H 2 f) are the vertices in the 
summation in A\ independent of what is happening 
of downstream areas. // 2 j indicates the non-outage 
hypothesis in area 2. 

2) V(H 2 i ( 2 ' j ) is the set of vertices in the summation 
for area Area 2. Although H 2 i ( 2 ) is unknown, s 2 = 
J2vev(H .>- ( ,)) x ( v )’ fhc the unknown hypothesis can 
be eliminated. 

. Eq. @, replaces the second summation with the observed 
flow, since they are equivalent. Therefore, if we condition on 
the remaining flow observation, child(si), the flow decouple 
between different areas. 

Finally, since x(v), is not known, a likelihood function for the 
net flow in the area can be constmcted given the load forecasts. 
The first term in eq. ([20]) is modeled with the following: 


S 1 -S 2 |{S 2 X, H} ~ N(n(x,H),a(x,H)). (21) 


Evaluating and cr(E, II) can be computed easily and 

is described in Section IVl-BI 

Similarly, s 2 is decoupled from any hypothesis in Hf(Ai), 
since the measurement depends only on downstream hypotheses 
(assuming s 2 has some positive flow to begin with). In this 
example we can decouple the likelihood function as follows. 


Pr(sis 2 |x, #i, H 2 ) = Pr (s 2 | x, iF 2 ) Pr (si|s 2 x, Hf) 


In this example, this decoupling can be represented as a simple 
graphical model as shown in Figure 3(c) where each local 
hypothesis is an unknown variable that must be determined via 
likelihood maximization. Conditioning on the the only observa¬ 
tions Si, s 2 , the two hypotheses variables are independent. This 
graphical model formulation is used to show the general case 
where there may be noise in the flow measurements. In such 
a case, the decoupling would not work, but a message passing 
algorithm can be used. The graphical model formulation can also 
be applied in different sensor types. To show the general case, eq 
(p~8|)-(|20[>, can be extended to a general area network with multiple 
downstream sensors. 


D. Decoupled Maximum Likelihood Function: 

We can combine the results shown so far to a decoupled 
likelihood function. 


{fc*, H} £ argmax Pr(s|x, H) 

VkyHeU k 

= argmaxPr ({s + s z }|x, H) 

= argmaxPr (s + |x, H) 

MH&'Ht 

= argmax Pr(s + |x, Hi...Hm) 
VAgA+ VffiGW+(A) 


( 22 ) 

(23) 

(24) 

(25) 


argmax IT Pr (si|child(sj), x, Hi) 

VAG.A+ vff i e«+( J 4 ) i:A *£ 4+ 


(26) 


= A argmax Pr (s, |child(.s, ),x, Hi) (27) 


Here eq. ( [22] ) is the original maximum likelihood hypothesis test 
over all H k outages. This is reduced to a search space over H u 
in eq. ( f23| . This further reduces to an even smaller search space 
Hf due to processing of binary flow information in eq. ( |24| . Eq. 
© decouples Hf to a product space of local search hypotheses. 
Eq. ( [26] ) decouples the likelihood functions by conditioning on 
the set of observations. This likelihood function is a product 
of terms which only depend on the local hypotheses. Therefore 
maximizing this product is equivalent to maximizing each term 
separately, as in eq. ©. The decoupling of the centralized like- 


Algorithm 1 : Maximum Likelihood Hypothesis Detector. 
Result: Maximum Likelihood Hypothesis Detector 
Input: [1] Load Forecast/Nominal Statistics: x, E 
[2] Real Time Load: s = {s + , s 2 } 

1 A + £- prune — areas(2l, s) 

2 for Ai £ A + do 

3 // Generate Local Hypothesis set. 

4 Hf (Ai) £- local — hypotheses(T, Si,d(si)) 

5 II Local MAP Detector 

6 Hi ■£- argmax Pr (sj|d(sj),x, Hi) 

VHiGH+iAi) 

7 end 

8 // Combine Local Hypotheses 

9 H <— A Hi 

AiGA+ 


lihood function in ( |26] >, leads to a simple decentralized detector 
in Algorithm [T] The input is (1) the set of load forecasts x with 
their nominal statistics E, (2) and the real time load information 
s. The function prune-areas simply discards areas with zero flow. 
The function local-hypotheses performs the generation of local 
hypothesis set Hf(Ai) as described in section [d] 

Finally the local MAP detector is simple to evaluate as a multi¬ 
hypothesis test involving scalar gaussian of known means and 
variances as follows: For a local area, since H + (A) is enumerated, 
we determine: 


H = argmax 

H£H+(A) 



Ssechild(si) S ~ /Ax, H) 

a(E,H) 


2 
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Therefore each decision only depends on an effective measure¬ 
ment 

As* 4 ai - ]T s (28) 

sGchild(si) 

for each local area. Computation of the means and variance are 
discussed in Section PVI-BI 

VI. Sensor Placement Problem 

In evaluating a placement, we use the maximum missed de¬ 
tection probability over all hypotheses. This is a useful metric 
since we are considering a very large number of alternative 
hypotheses. First consider the maximum error (II. A4 ) = 

Pr(fT ^ IT;placement M), where H is the optimal solution 
of the outage detection. 

This can be used for sensor placement evaluation. 

.Ad* = argmin max Pe(H,M) (OPT-1) 

\M\—M 

This is difficult to do outside of a combinatorial enumeration of 
sensor locations and hypotheses. A suitable proxy we optimize 
instead is the following: 

A4* = argmin maxP^A) (OPT-2) 

\M\=M A ^ A 

Where Pg ax (A) = max^ hgh + (A /) Pe{H, M), that is we only 
search over the hypothesis in the local area. This second opti¬ 
mization very closely approximates an upper bound to ( |OPT- 1 [ i 
(see appendix |F] for details). Optimization OPT-2 is solved via a 
bisection method on the following feasibility problem: 

M* = find M (OPT-3) 

\M\<M 

s.t. Pg iax (A) < p tar aet 

So the minimum P tarffet is determined which yields a solution 
of size |Ad | = M. This can be solved very efficiently, with the 
algorithm that follows. 

A. Feasibility Placement Algorithm 

The intuition for the greedy placement algorithm is the fol¬ 
lowing. Starting from the bottom of a tree, we successively 
maintain a temporary local area with root sensor in e t . The root 
sensor is iteratively moved closer to the root edge e\, while 
maintaining that the maximum error of all areas is less than 
ptarget' '] [-(j s j s dong by maintaining that the local area has 
error less than P* arffet . jp t jjj s is true we move closer to the 
root, if not, we place a sensor and start a new area. Since the 
objective function decouples across areas, we can maintain that 
the feasibility problem is always satisfied. Finally, if the number 
of sensors are less than M, M* is returned. This framework 
can be realized in Algorithm [2] The inputs to the method are 
the tree network T and the set of nominal load forecasts x and 
forecast variance E. To have the effect of starting at the leaf of 
the network and move our way up to the root, the algorithm will 
process a sequence of edges E vmcess . For example in Figure T] 
^process = {E(b 4 ) E(b 5 ) E(b 3 ) E(b 2 ) E(bi)}. We generate the 
list Pprocess in line [2] with function generate-edge-order. The 
function takes the tree T and traverses via breadth first search 
keeping track of the depth of each vertex/edge. Reversing this 


Algorithm 2 : Solution to optimization dOPT-j| for tree 
network. 


Result: Placement for a Tree Network 
Input: [ 1 ] Tree network T 

[2] Nominal loads statistics x, E 

[3] Subproblem Ordering V process 

[4] Target error P tar s et 

// Generate node process ordering 
^process <- generate-edge-order(T) 

// initialize sensor placement as empty 
M 9 <r~ 0 

for c t £ E process do 

A «- construct-area (e t , M 9 ) 
ll Evaluate the current subtree maximum missed 

if P|? ax (A) < P ,ar9et then 
| // continue to next node 
else 

if | child (vt) | == 1 then 
j M 9 <— line-action (A, M 9 ) 
else if |child(vt)| > 1 then 
I M 9 <— tree-action(A, M 9 ) 


end 

17 end 

j 8 return M 9 


list of depths yields a list of nodes to process, the parent edge 
being e € Pprocess- 

The greedy solution A i 9 must first be initialized as empty. In 
line [5] we iterate over the current root node et and current sensor 
placement A i. In line [ 6 ] we construct the current area network A. 
We then evaluate P“ ax (A) in line [IJ If P“ ax (A) > P tar s et we 
perform a placement action line-action or tree-action depending 
on the number of child nodes of v t (downstream of et). Each sub 
function is described in more detail as follows. 

1) construct-area: For each iteration, the temporary node et 
is visited and the area network A is constructed with e t and the 
previous solution A4 9 . The current edge et is the temporary root 
sensors of the area Sf. The terminal sensors of the area are any 
sensors in AA 9 that are children of s t . Note that this may be 
empty at the start of the algorithm. 

2) line-action: Given that our current subproblem satisfies 
Pg ax (A) < p tar s et , if the next subproblem does not satisfy the 
condition, our only option is place a sensor at child (vt). So 
M 9 <— M 9 U e t . 

3 ) tree-action: Given that the algorithm up to now has placed 
sensors on the two disjoint trees with roots with v\ and ty We 
must move as far up to the root as possible before we are forced to 
place a measurement. This leads to two different types of actions: 
a greedy strategy that is easy to implement and the optimal 
strategy. The greedy strategy is implemented in practice and is 
almost always equal to the optimal strategy, which is discussed 
in Appendix [H] 

The greedy strategy chooses the area network with the smallest 
PJ? ax (A). For example, in Figure [4j first assume P|? ax (Ao) and 
P E ax ( A o ) < P mget and P|? ax (Ai) > P tar £ et . In moving e t 
closer to the root, we must choose either A 2 or A 3 based on 
the placement which has the smallest error. 
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Fig. 4. (greedy-tree-action) CoiTect node traversal in the tree network assumes 
that Pg iax (A 0 ) < P tar § et and P™ ax (A' 0 ) < P tar § et . If P^ ax (Ai) < P tar § et we 
do nothing. Else, we generate and evaluate the error one of the 2l chlld ( Vt )l — 1 
area networks that can be constructed for example A ^ and A". 


B. Evaluating Pg a,x (A) 

The objective Pg lax (yl i ) is computed as follows. For each 
local hypothesis H € T~L + f), the conditional distribution 
As| x H ~ N(fi(x, H), is computed. 

To compute this, we introduce the following W^(k) = 

Eugdesc(ti) %( v ) which computes the cumulative load forecast of 
all descendent vertices. Similarly W a (v) = Ewedesc(u) 0-2 ( v ) f° r 
the forecast variances. For a given area, we evaluate the A Si under 
hypothesis H, assuming the root .s, and child sensor locations 
child f.s,) for a particular observed flow. 

n(Z,H)=W i (a i )- ]T ^(e)- E W ^ e ) < 29 > 

e£V(H) eGchild(si) 


outage detector with greedy placement is extremely efficient (only 
0{\E\ 2 ) complexity). 


VII. Distribution System Case Study 

For the remaining case studies only single area outages are 
considered. 


A. PNNL Case Study 


-El- R1 — 12.47—1 
-B-R2-1 2.47-3 
R5-12.47-1 
R5-12.47-4 
-El- R5-25.00-1 


Note that this is computed for a particular binary flow pattern. 
Therefore, no outage edge upstream from the child sensors that 
we consider. Likewise er(x, iT) is computed via W a . 

Finally, given the scalar distribution for Asj|{x H}, the 
probability of missed detection for a scalar maximum likelihood 
detector can be computed. 

C. Optimality and Complexity 

This discussion concerns only optimal-tree-action since it 
guarantees optimality, although the greedy strategy output is 
almost always identical. 

Theorem 1. The bottom up placement solution A4 9 relying on 
optimal-tree-action traversal solves OPT-3. 

Complexity of evaluating the detector and the objective as well 
as the placement algorithm is discussed in Appendix[E] The main 
results are summarized as follows: 

• The worst case complexity of evaluating the detector for an 
area and it’s missed detection P]£ ax (A) for an area of size 
\E\ is 0(4^1). Evaluating only outages of size k is 0(\E\ k ). 

• Given a fixed size to evaluating P£ ax (A) the greedy place¬ 
ment algorithm is of 0{\E\) complexity while the optimal 
strategy is of O( 4 log 1^1 ) complexity. 

The detector and placement complexity in the worst case is 
quite poor. However in any practice, this cost is averted using a 
detector fixed hypothesis size. The outage model was of each 
edge having some finite prior likelihood of outage. Therefore 
a multi-edge outage of large size is much less likely. For this 
reason, all k outages do not need to be enumerated. In practical 
a single edge outage per area may be sufficient. Using a single 
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Fig. 5. Outage detection performance for selected PNNL feeders. Square marker 
denotes specified error target for optimization. Circular marker denotes empirical 
mean missed detection error for hypothesis set. 


We perform outage detection using a subset of the Pacific 
Northwest National Laboratory test feeders CD. Table [T] gives 
overview of the feeders chosen for the simulation study. The 
primary applications of the feeders, are heavy to light urban 
networks, as well as suburban and rural networks. The climate 
zones refer to (1) temperate (2) hot/arid (3) cold (4) hot/cold (5) 
hot/humid according to HD- 

table i 

PNNL Test feeders used in case study 


Network 

Voltage 

Climate Zone 

Type 

Size 

Rl-12.47-1 

12.5 kV 

i 

suburban 

613 

R2-12.47-3 

12.47 kV 

2 

urban 

52 

R5-12.47-1 

13.8 kV 

5 

urban 

265 

R5-12.47-4 

12.47 kV 

5 

commercial 

643 

R5-25.00-1 

22.9 kV 

5 

suburban 

946 


1) Outage Model: For the PNNL feeders, outages are simu¬ 
lated by fuses and switches disconnected the downstream loads 
from the substation feeders. For each network, all fuses and 
switches are reduced to edges in the general tree network. The set 
of loads which are disconnected by a fuse or switch disconnecting 
are lumped to aggregate loads. For these loads, the mean load of 
each group of fuses can vary. 























Sensor Density 


2 ) Forecast Error Model: In lll2l the authors present a rule of 
thumb model for day ahead load forecasting at various aggrega¬ 
tion levels based on smart meter data. The day ahead forecast 
coefficient of variation, k = <7 /// i s shown to be dependent 
on the mean load of the group. Many studies make simplified 
assumptions on the relative forecast error. However, at the level 
of small aggregates, the forecast k can vary greatly on the size of 
the aggregate and must be t aken into ac count. A Reasonable fit 
shown in llT2l is n(W) = + 41.9. This formula is used to 

show that each set of islanded loads will have a different value 
of k(W). 

Figure [5] shows the application of the sensor placement algo¬ 
rithm for each network. Even though each networks represents a 
different applications, they show somewhat similar performance 
in terms of placement density. Averaged over each of the network 
configurations, attaining 10% mean missed detection error is 
possible by having 30% sensor density. Seen another way, we can 
reduce the realtime monitoring of each fuse by 70% by tolerating 
a small amount of error in the outage decision. 


B. General Line and Tree Network Sensitivity 



(a) (b) 


Fig. 6. The effect of optimization error target P targct and relative forecast error 
k on both line ^6(a)f and tree (6(b)) networks. 
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Fig. 7. Hypothesis missed detection analysis: |7(a)| error histogram for p tar s et = 
0.2; 17(b)] tree network reduction to mean error. 


record the value of each hypothesis error. The empirical maximum 
error is close to the target 0.2. This makes sense because in 
successively solving the feasibility problem, we will expand the 
area network until the maximum error surpasses P target . However, 
we see that in fact almost all of the missed detection probabilities 
are less than the target. For this example in particular 34% of the 
hypothesis are less than le -3 therefore essentially zero. 

In comparing the mean and maximum errors for the range of 
achievable values of n and P target . The mean error is in on average 
25% of the P talget . The maximum error in the network and P target 
very closely, therefore the optimization yields a very tight result. 

VIII. Conclusion 

We propose an outage detection framework combining power 
flow measurements on edges of the distribution system along with 
consumption forecasts at nodes of the network. We formulate 
the detection problem and provide an optimal placement for the 
maximum missed detection error metric. Finally, relying on feeder 
information from the pacific northwest national labs as well as a 
forecast error scaling law derived from Pacific Gas and Electric 
smart meter data, we demonstrate our formulation. 


Figure [6] shows the sensitivity of the line and tree networks 
under different simulation parameters. Both networks are of 
length 100 nodes, the tree was generated using the method 
in ED. In an ideal line network with extremely high forecast 
accuracy (k = 1%), 1 or 2 sensors are required for extremely low 
missed detection errors. This extreme situation does not occur in 
practice, but serves as a baseline for realistic networks. From 
Figure 6(a) we see that the required sensor density decreases 
quite quickly vs. P target . The relation between sensor density and 
ptarget j s smoo thly decaying.In comparison, randomly generated 
tree networks require on average 2 — 3 times as many sensors to 
achieve the same error target. 


C. Missed Detection Error 

Optimization ( |OPT-2| ) is meant to minimize the maximum 
missed detection error among all possible hypothesis. This is 
clearly can be too conservative of a requirement. Therefore it 
is useful to understand the nature of the actual hypothesis missed 
detection values that arise from a given sensor placement. 

Figure |7(a)| shows the distribution of missed detection proba¬ 
bilities for a tree network. Setting P target = 0.2, and k = 0.3 we 
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Appendix 


A. Nomenclature Table 


T 

(V,E) 

V(H) 

desc(v) 

child(u) 



TL 1 

n k 

H u 


A, A, A + 
H+(A k ,f) 

H'k,i(k) 

M 

s 

Tij 

Ru{H) 
Pe,c(H,M), 
Pg ax (A) 
prnin^A) 


Tree network representation of a distribution 
feeder 

Vertex and edge set of tree T 

Set of vertices that are connected to root under 

outage hypothesis H. 

Descendants of vertex v. 

Children of vertex v. 

Scalar load and forecast value at vertex v 
Forecast residual for load l(v) 

Forecast residual variance and covariance ma¬ 
trix. 

Single outage hypothesis and element and set. 
k-outage hypothesis. 

Unique hypothesis where no edges are down¬ 
stream of any others. 

Local area network A £ A', Pruned areas under 
binary flow processing. 

Set of local area hypotheses post binary flow 
processing. 

i(k) th hypothesis in area A k . Used to recon¬ 
struct global hypothesis Hi. 

Sensor placement (M c E) 

Set of observations on edges of network 
Acceptance region for pairwise test of hy¬ 
potheses: Hi and Hj 

Acceptance region of hypothesis H over all 
alternatives in H. 

Probability of error (E) and correct detection 
(C) for hypothesis H, under placement A 4. 
Maximum probability of incorrect detection 
over all hypothesis error in area A. 

Minimum probability of correct detection over 
all hypothesis error in area A. 


B. MAP Detection for Outage Hyptheses 

Here we show how the general MAP detector rule can be 
evaluated for where we combine edge flows s, load forecasts x 
and candidate outages H. 


H = arg max Pr (H | s, x) 
ffeH 1 ' 

Pr (s, x I H) Pr (H) 

= arg max-— —- 

HeH k Pr(s, x) 

= arg max Pr (s, x | H) Pr (H) 

Heu k 

= argmaxPr(s | x, H) Pr (x | H)Pr(H) 
HeH k 

= argmaxPr(s | x, H) Pr (x) Pr (H) 
Hew. k 

= argmaxPr(s | x, H)Pr(H) 

H&H k 

= argmaxPr(s | x, H) 

H£H k 


(30) 

(31) 

(32) 

(33) 

(34) 

(35) 

(36) 


Lines ( [30] ) - ( [32| convert the MAP detector to a likelihood 
detector with prior weights. Line m conditions on the load 
forecast x. Since x does not depend on the outage hypothesis, 
(only s does), the term can be removed leading to (|35|). In ([36]), 
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we assume a uniform prior over all hypotheses, however this does 
not have to be the case. 

Given the assumption of each edge going into outage with some 
fixed prior probability, a single edge outage hypothesis should 
have Pr ( H ) = p, while a k outage condition should have a 
prior of Pr (H ) = p k . This motivates enumerating fewer outage 
hypotheses when evaluating H u in practice. 


we have that: H + ({ 1 0}) = {E(b x )}. We see that splitting the 
hypotheses based on flow information, conserves the search space, 
since H+({ 1 1}) U H+({1 0}) = H u . 

4) Tree T\: For tree 7i, we have: 

Hu(bi) = {0 U E(b x ) U E(b 2 ) U H u (b 3 ) U E(b 2 ) x H u (b 3 )} 
n u (b 3 ) = {0 U E{b 3 ) U E(b 4 ) U E(b B ) U {E{b A ) x E(b 5 )}} 


C. Extended Discussion of Recursive Evaluation of H u 


D. General Hypothesis Decoupling 



We present here additional worked out examples, that show 
how the recursive definition can enumerate all hypotheses and 
focus on some corner cases that must be defined. Recall the 
recursive definition: 


H u (b) = E(b) U 



1) Example 1 (Figure 
evaluate and is: 


TO 


This is the simplest case to 


H u = £(&i)U0 

The null hypothesis set arises from evaluating P(d(&i)) = 0, since 
b x has no children. 

2) Example 2 (Figure 8(b)\ : This is the simplest case to 
evaluate and is: 


These two cases provide the intuition for a general procedure 
which is as follows: Given binary information from flows, all 
areas A /. with rooted sensor with Sk = 0 , are discarded in 
generating a local hypothesis. Each node in the branch graph is 
assigned a label, l £ L L = {P, Z,U} for ( P ) positive, ( Z ) zero, 
and (U) undetermined branches. These are defined as follows: 
Positive Branch: Branch is upstream from a sensor mea¬ 
suring positive flow, therefore can never be evaluated in 
any outage hypothesis. Also, it’s immediate parent branch 
cannot be enumerated either. 

Zero Branch: This branch is directly upstream from a 
zero measurement therefore it’s edges must always be 
enumerated in any outage hypothesis. 

Undetermined Branch: This branch has no information, 
so is enumerated without any restriction. 

This definition leads to the following procedure to enumerate 
H + (A.i, I{chiid(s i ) > o})- Fi rst each branch-node is labeled with the 
following procedure: 

Initialization Branch with descendent sensor (1) s > 0 
assigned label P and (2) s = 0 assigned label Z , and (3) 
no descendants assigned label U. 

Update Given a current branch node b and the set of 
children, the node is assigned as follows: (1) If any de¬ 
scendent node is labelled P then it must be labelled P. (2) 
If descendants are U and Z, then it must be labeled U. 


H u = {E(b\) U 0 U H u [bf] U H u (b 3 ) U H u (b 2 ) x H u {b 3 )} 

= {E(b x ) U 0 U {0 U E(b 2 )} U {0 U E(b 3 )}U 
{0U£(6 2 )} x {0U E(b 3 )}} 

= {0 U E(h) U E(b 2 ) U E(b 3 ) U (E(h) x E(b 3 ))} 

This example is reduced to it’s final form in eq. m in Sec¬ 
tion |V-A| However, the following equalities are omitted in the 
enumeration: 


0 U 0 = 0 

(37) 

TSt 

X 

TSt 

II 

(38) 

qT 

II 

0 ? 

X 

(39) 


The final relation leads to 0 x E(b) = E(b). 

3) Example 3 (Figure Consider the two binary flow 

indicators I{ Sl , S 2 >o} = {1 1} an d {1 0}. In the first case, we 
have the following product set: 

n+({ 1 1}) = {E(b 2 ) U 0!} x {E(b 3 ) U 0 2 } 

= {E(b 2 ) x E(b 3 ) U 0i x E(b 3 ) U E(b 2 ) x 0 2 U 0 2 x 
= {E(b 2 ) x E{b 3 ) U E{b 2 ) U E{b 3 ) U 0} 


Once the the branch-nodes are labeled, enumerating 
7f + (A M I{ chi id( Si )>o}) can be done recursively using the 
following rules: 

Positive Rule Never enumerate a branch ( E(b )) with 
positive flow label ( P ). 

Zero Rule When evaluating the recursive definition H(b) 
on an element of the power set. If any descendent is labelled 
Z, only evaluate product set elements that contain this 
branch. 


P 




We use the fact that 0 X E(f>) — E(b) and that the product Fig. 9. |9(a)| General network reduced to individual branches. [9(a)] Worst case 
0! X 0 2 = 0, which is the global null hypothesis from the naive tree network of depth D and K children for each vertex, 
enumeration in Example 2. Similarly, enumerating the {10} case. 
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TABLE II 


Binary Flow Id( Si )> 0 > 

Branch Labels 

Hypotheses Branches 

0,0 

u, z, u, u, z 

b\, 62 , 62 x 63 , 

&2 X 65 , 62 X 64 X 65 

0. 1 

P, Z, P, U, P 

62,62 X 64 

1,0 

P, P, u, u, z 

b3, bs, t>4 X &5 

1,1 

P, P, P, U, P 

64 


An example local area is provided in Figure |9(a)| The general 
method is applied under each of the binary flow cases, where 
the results are shown in Table [II] The method is applied to each 
binary flow, and the branches to be enumerated are given. 


E. Complexity Analysis 

In analyzing the complexity of various algorithms we assume 
the tree in Figure 9(b)| Each vertex has K children and is of 
depth D. It can be shown that the number of edges is related to 
these quantities by E = K . 

1) Evaluating P E ax (A): For simplicity, we focus on a binary 
tree, so K = 2 and |2?| = 2 D — l. The number of possible 
hypotheses at each depth C(d) using this network is related 
recursively as the following: 


K 


C{d+l) = Y J ( C n (d + 1) 


n—1 


K 


= (C(d) + l) K — 1 


(40) 

(41) 


This can be derived directly from (|9). 

In the binary tree case, it’s simple to show that C(d) = 4 2 — 1. 
Therefore the number of hypotheses are double exponential in 
the depth of the tree. For the entire tree, this leads to the root 
node value of C(D) = 4 2 — 1 which is 4 ( -l s l +1 ) — 1. Therefore 
\H U \ = O (4l B l), exponential in the size of the graph. 

This cost may be averted due to the following: 

• Fixed multi-hypothesis size. The MAP detector requires a 
prior probability of hypotheses. Since multiple edge outages 
are less likely, they don’t always have to be enumerated. 
For example, considering only single edge outages leads to 
\H U \ = O (\E\) complexity for an area. 

• Small area sizes, and Binary Flow segmentation. The number 
of hypotheses are exponential in \E\ for an area. Given many 
sensors, this can divide the number of edges for an area 
considerably. Additionally, the binary flow information from 
downstream sensors will on average divide each 'H + ( K A) by 
a factor of 2l d WI. 


The following analysis is in terms of the evaluation of 
I y j"" x (A) since we assume appropriate approximation of this 
function has been performed. 

2) Evaluation of Algorithm [2] using greedy strategy: The 
greedy strategy will have to evaluate all 2 K subproblems at 
each vertex, and choose the minimum Pf lax (A). The worst case 
complexity is therefore 0(\E\ 2 K ), which reduces to 0(\E\), 
since K is a constant. 

3) Evaluation of Algorithm [2] using optimal strategy: The 
optimal strategy will expand the problem size by a factor 2 K 
at each vertex. The number of sub-problems to consider after 
a depth of D will be (2 K ) D which for a binary tree becomes 
0( 4 log l B l). 


F. Proxy Function Optimization 

The placement problem |OPT-l| will output the optimal sen¬ 
sor locations and minimizing maximum missed detection error 
as: 


a*(M *) = min max P E (H,M.) 

\M\—M HgH u 


This can be approximated by using the decoupling of hypotheses 
in different areas. Recall the solution to |OPT- l] repeated above, is 
the distributed detector in Algorithm [l] where each area performs 
a local hypothesis H ,..., Hm- The complete hypothesis is only 
correct if every local detection output is correct. So for any 
hypothesis, we have following lower bound: 


min Pr(H = H;A4) 
H£H U 


> 


min 

HeH u 

n pr{H ‘ 

\VAi, Hi 

II 

min Pr(H = 

VAgA 

HGH(A) 


= n p™ n (A). 

VAgA 


(42) 

(43) 

(44) 


Line © follows from the decoupling of the decentralized 
detector. The overall MAP decision can be correct only if each 
local MAP decision is correct. For any H £ T-L u the probability 
of each area making a correct decision is always greater than 
the worst case probability of correct decision for each area. We 
interchange the sensor placement and area notation since local 
areas are constructed from sensor placements. Here /■”(””(/l) is 
the minimum probability of correct detection within a local area 
A. This lower bound can be used to first upper bound the optimal 
a*. Finally, only an approximate solution to the upper bound is 
formulated. 


a(M*) = min ma x Pe(H,M) 
\M\=M HgH u 

= min max (1 — Pn(H, M)) 
\m\=m hghA 

= 1— max min Pn(H,M.) 
\M\—M HgH u 


< 1 — max 
\M\=M 


n p c in w 

VAgA 


#1- max min PP in {A) 
\M\=M VAgA 


= 1 - max max (1 - P^ ax (A)) 

\M\=M VAe.4 


= min max P^ ax (A) 
\M\=M MAgA 


(45) 

(46) 

(47) 

(48) 

(49) 

(50) 

(51) 


Optimization |OPTT| is identical to (|46j>, since the probability of 
a single hypothesis error can be exchanged for it’s compliment. 
The min-max to max-min change is due to the negative sign 
In line [47] instead of maximizing the minimum correct 


47 


in 

probability over all hypotheses we maximize a computationally 
tractable lower bound which is the product IIvAeA Pc‘ m {A)- 
In [49] we introduce a close approximate solution which is the 
following: Instead of maximizing the product of Pf! r " (A) for 
each A, it is sufficient to maximizing the minimum of each 
jymirt (^) Experimentally the two solutions have been shown 
identical for a large number instances, and only sub-optimal in a 
small number of cases where the gap is small. 
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Fig. 10. |10(a)| Brute force placement evaluation where the two objectives are 
identical. 110(b)| Brute force placement evaluation where the two objectives differ. 


Experimentally the two solutions have been shown identical 
for a large number instances, and only sub-optimal in a small 


number of cases where the gap is small. Figure 10(a) 10(b) 


shows a pair of randomly generated trees with N = 15 nodes 
with random loads and a forecast coefficient of variation of 0.02. 
In both cases, the bottom up placement was used to determine 
M 9 , where \M a \ = 5. 

A brute force enumeration of all (g 5 ) placements is evaluated 
for ma x VAeA M) and TIvag^ p c‘ in (AM). 

In both cases, there is a strong correlation between the two 


solutions. The two solutions are not however equal Figure 10(a) 


the solutions are identical, while in 1 1 0(b)| the two solutions differ 
by 7.2%. Intuitively they should intuitively be very close to each 
other. Decreasing one area error will increase the other area’s 
error due to the monotonic growth of P^ ax (A) and the finite 
tree size. Therefore, maximizing the product of all the terms tends 
to a solution where each area error is as close to each other as 
possible. Minimizing the maximum error often leads to such a 
solution, since we must trade off one area error for another. 


G. Proof of Theorem 1 

We prove Theorem 1 by showing the following: 

1) The objective function PJf ax (A) monotonically increases 
for nested areas. 

2) Algorithm [2j will recover the solution to OPT-3 


First we prove propositions |T| and state a conjecture shown to 
hold in large scale simulation experiments. These are needed to 
prove Lemma [2] which is needed to prove Theorem 1. 

First consider the following: 

Definition 1. For a single pairwise test, Hi vs Hj we have the 
following decision region: 


n,j = {s£ R : Pr(s|fT.j) > Pr(s|iT,)}. 


(52) 


The observation space is therefore partitioned into two regions. 
So the detector is the following: 


H = 


Hi, s e r t 
H 3 , 


(53) 


' %]• 


For the one-to-many ML test: It, vs V Hj £ H, we have an 
acceptance region defined as: 


Rn(Hi) ={se R M : Pr(s|fli) > Pr(s| Hj) VHj £ H}. (54) 


Lemma 1. An equivalent definition is 

R(Hi)= p| r itj . (55) 

r.H^H 

Proof: Using the definition of the right hand side, we have 

Pi n,j = {s: s e u,i n... n s e r iiN } 

j£H 

= {s : Pr(s|fT,) > Pr(s|fTi) (T ... 

n Pr(s|iTj) > Pr(s\H N )} 

= {s : Pr(s|flj) > Pr(s|i7j) WH 3 £ H} 

= Rn(Hi). 


Note, that this statement is for the maximum likelihood clas¬ 
sifier. Under an arbitrary classifier, a procedure of constructing 
one-to-many classifier will lead to ambiguity. See 0 (pg. 183) 
for discussion. 

Conjecture 1. Given a ML detection problem with a set of 
hypothesis of the form: Sk ~ N(pp, + A) for k = 1,..., K. 
The missed detection error for each hypothesis will monotonically 
increase w.r.t A. 

This is seen to hold with 200,000 random problem instantia¬ 
tions. We now state Lemma [2] 

Definition 2. 7vvo area networks are nested A C A' if the vertices 
of each area V, V' are such that V C V'. 

Lemma 2. Given two area networks A and A' where A C A', 
prn°,x(A) < P^ ax (A'). 




(a) (b) 

Fig. 11. 1 11 ( a)| Case 1 showing growth by adding new nodes by moving terminal 
sensor down. The conditional pdf As|iT V h £ H does not change. Flowever the 
acceptance region shrinks from Ryi to fi'K'j./: ■ 1 1 1 (b) | Case 2 showing growth by 
adding new nodes by moving root sensor up. The conditional pdf As'\H V/i £ H 
will change. Again, the acceptance region shrinks from Rn to R' HUx . 


Proof: Given some a* = P™ ax (A), which is the maximum 
missed detection probability of the local hypothesis in area A 
which is evaluated at some hypothesis H*. We aim to show that 
for an enlarged area A ', the error probability for H* will always 
be larger, which is quite intuitive. Therefore the maximum error 
jjui;ix(a') regardless if H* maximizes the missed detection in 
the enlarged area. 

Expansion of A is analyzed in two cases: 


Case 1 Terminal sensors expand downstream (away from 
vo ) shown in Figure 1 11 (a) 


Case 2 Root sensor of A moves upstream (closer to vq) 


shown in Figure 11(b) 
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We use the following shorthand: Hypotheses for area A and 
A': H = 'H(A), 'HU Xi = T-L(A'). The set Xi differ in how the 
area is enlarged, where i = 1,2 for case 1 and case 2. Therefore 
under case 1, and 2 we have R-huxi (Hi) and R-HUx 2 (Hi). Now 
we show how the effective measurement distribution (as defined 
in ( [28] )) and acceptance regions changes under each case. 

Case 1 For all hypothesis Hi £ H we have that //' = //, 
and a' i = of. The distribution A.s| //, is unchanged. Given 
that R-u(Hi) = f] j-H ev. r A ’ the new acceptance region is 
Rhux! {Hi) = r\ J -.H j &nux 1 r H’ wit h r tj from the original 
alternatives unchanged. 

Case 2 For all hypothesis Hi £ PL we have that /z' = p e +pi 
and a'1 = of + of. The distribution and acceptance region 
change As| Hi —> AV //, . The new acceptance region is the 
following: R nUx 2 (Hi) = OH <, i H in,://,: 
where the acceptance regions under the previous area alter¬ 
natives are now different. 

To see why the distribution A.s\Hi changes, first recall that: 


A.Si\H k ~ N | ^2 ^ a2 ( v ) _ a T I (56) 

veVi\v k veVi\v k 

N (/tfe, MH £ % (57) 


The terms px and of- are the sum of loads forecasts and 
variances of all terminal sensors. Now moving the root node 
upstream leads to: 


s'i\H k ~ N I ^2 A*0) - A it, ^2 a ~ I (58) 
\vev'\v k vev'\v k J 

~ N (n e + m,ol+a?)VH £PL. (59) 

Therefore changing the position of s- L so as to add additional 
vertices will increase every original hypothesis mean and variance 
by the same amount. 

Now consider Case 1 first, where we merely add new alter¬ 
natives, keeping the distributions of As\H unchanged. Here we 
have: 


Pr(As e R UUx {H*)\H* true) (60) 

= Pr(As £ P| rij | H* true) (61) 

j:Hj GKUx 

= Pr({As £ n p As £ Vi t j}\ H* true) 

j'Hj£H i-Hjex 

(62) 

< Pr(As £ P e nj\H* true) (63) 

Hjen 

= Pr(As e TZ h (H*)\H* true). (64) 


Line 61 defines the area R-h(H*) using proposition [I] This is 
split into the the intersection of two separate events using our 
definitions of % and PLUx. Next we use the fact that Pr(HnH) < 
Pr(H). Therefore if Pr(As £ R%(H*)\H* true) < Pr(As £ 
Rhux 2 (H*)\H* true) then P™ ax (A) < P£ ax {A'). 

We next prove Case 2 where not only are more alternatives 
considered for H*, but As\H* is translated by fixed amount fi e , 


a\ in mean and variance. This implies that: 

Pr(As' £R-hux\H* true) 

< Pr(As' € R U {H*)\H* true) 

= Pr(As' — n e £ R-u(H*) — n e \H* true) 

< Pr(As £ R-h\H* true). 


(65) 

( 66 ) 
(67) 


The inequality in line [65] uses the identical procedure in 61 


63 In line 66 we are merely shifting the gaussian As and the 
acceptance region by ji e using a shorthand notation. This can be 
done since the MAP test is scalar. Finally the inequality in line 
|67| follows from conjecture [T] ■ 

We can now prove Theorem |T| 

Proof: The bottom up solution A4 3 moved to the root node 
enlarging each area network so that each Pff ax (A) < p tar ff et but 
any further up will violate the target area. Consider some other 
method produces a solution M' which minimizes the number of 
sensors the error constraint, where \M!\ < \M 9 \. This implies 
that some area must increase in size, as compared to the A4 9 
solution, and from Lemma [2] some area will violate the error 
constraint. ■ 


H. Greedy and Optimal Tree Action 

optimal-tree-action: The correct action at a node-junction is 
to enumerate each ol chlld ( ,v )N 1 possible trees and process them 
until only one remains closest to the root. In the example in 
Figure [4] we must process to the root node processing both A 2 
and A 3 in parallel as separate problem instances, with its own vt 
and M 9 . This is in contrast to the greedy strategy that chooses 
one placement and moves on. Each problem instance is then 
processed, until the area objective function violates p tar s et . All 
but one problem instance is kept; the one where v t was closest 
to the root vertex. 

It turns out that the greedy-tree-action and the optimal-tree- 
action procedures are in practice extremely close as discussed 
in Appendix |H| Algorithm [2] can only implement this technique, 
since we do not grow the search space with multiple bottom up 
scenarios. 

The bottom up placement using greedy-tree-action relies on 
moving the current node as close to root as possible while keeping 
the error < At a juncture, recall that the current network 

is chosen as the one which minimizes the maximum error, and 
continues with that choice. To see why this is sub optimal, 
consider Figure [12] shows a typical subtree where we have the 
following events: 

1) (Figure 1 12(a) 1 The bottom up algorithm will evaluate 


prnax^j^L'j ^ ptarget pmax ^ ptarget 


2) (Figure [l2(b)| i Move onto evaluating the combined area net¬ 
work with the parent node. Evaluating Pff ax (A 2 ) > P tar s et 5 
we mus t choo se in a greedy manner via greedy-tree-action. 

3) (Figure 12(c))) Evaluating both P™ ax (A%) and P™ ax (A§), 


where P™ ax (A%) < P™ ax (A§) < P tar aet xhe greedy 
choice will ke ep A 3 and discard the A f. 

4) (Figure 12(d) 1 The optimal choice is in fact Af since 


prnax ^ ptarget ^ pmax 

The greedy choice will choose A 3 and be forced to place a 
sensor in A T y, while the optimal can continue upstream. The 
following numerical example will lead to this: //, = 1, Vi and 
cr 2 = {0.0599, 0.0125, 0.0835, 0.0945, 0.0906, 0.0607} with 

ptarget = Q. 1923 . 
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(c) 


(d) 


Fig. 12. |l2(a)| Both areas have maximum error sm aller th an target error: 


^ jptarget 


(Af) < Parget 12ft) 


The area’s are 
( A 2 ) > p tar 9 e t l^cTTfie greedy choice 

(A%) < P™ air J~kf) < Parget 1 12(d) 
The correct choice is Af since P™ ox (Ag) < p tar get < pniax( A Ly 


will choose A 3 as the candidate: p™ ax 


The triplet of scalar hypotheses which cause this are shown in 
Table |ffl] 


0.3 

0.25 

0.2 

0.15 

0.1 

0.05. 









of + A 


L 


Op + A 


R 


...ptarget 


0.02 0.04 0.06 0.08 0.1 

A 


(a) 


Fig. 13. The maximum probability of error over three hypotheses as a function of 
a translation of each hypothesis variance. At the point A = 0, P(Ag ) < P(A^), 
and in point A = 0.0599, P(A%) > P(A^) 


This can be seen in Figure 13(a) where the maximum hypothe¬ 
sis error between the tuples is shown with respect to translation A. 
As indicated, any A > 0.02 will cause their ordering to change, 
with the counterexample shown to be beyond this point. 

However, we should not that although the transition does occur, 
the gap is quiet small so any realistic gap will be very small. Since 
ptarget j s b e tw een P(A R ) and P(A R ) it is very unlikely to occur. 
In finding a counterexample 10000 monte carlo runs produced 
5 examples. For this reason, the greedy and optimal placement 
strategies will have identical outcomes of random instances. 


TABLE III 

PNNL Test feeders used in case study 


Area 

Flow 

H+(A,b) 







P™ ax (A) 

A 3 

S2 > 0 , 
S 5 > 0 

63,64, 0 

1 , 

2 , 

3 

0.0125, 

0.1031. 

0.1637 

0.1083 

A? 

52 > 0 , 

53 > 0 

e5>e6,:0 

1 , 

2 , 

3 

0.0125, 

0.0960, 

0.1905 

0.0961 

A i 

si > 0 , 

S 3 > 0 

e3,e4,0 

2 , 

3, 

4 

0.0724, 

0.1558, 

0.2504 

0.1885 

Af 

si > 0 , 
S 3 > 0 

65i e 6;,0 

2 , 

3, 

4 

0.0724, 

0.1630, 

0.2236 

0.1960 


Notice that in the hypothesis means and variances, we have the 
following: 

Vk{ A i) = + p k (A%) 

Hk{ A i) = n{v{) + Hk{A*) 
ct 2 (A_4 ) = A + <j k (A R ) 

<t 2 (A^) = A + fJfe(Af) 

Where p{vi) = 1, and A = cr 2 (ui) = 0.0599. The translation 
of mean and variance causes the maximum error over an area to 
switch from A L to A R . Recall that conjecture [l] stated that the 
maximum error in such a case will monotonically increase with 
respect to some translation in variances. Regardless, there is no 
domination between any pair of triplets, whereby one will always 
be greater than another under the same translation. 






































